How to Overcome the Domain Barriers in Pattern-Based Machine Translation System
نویسندگان
چکیده
One of difficult issues in pattern-based machine translation system is maybe to find how to overcome the domain difference in adapting a system from one domain to other domain. This paper describes how we have resolved such barriers among domains as default target word of any domain, domain-specific patterns, and domain adaptation of engine modules in pattern-based machine translation system, especially English-Korean pattern-based machine translation system. For this, we will discuss two types of customization methods which mean a method adapting an existing system to new domain. One is the pure customization method introduced for patent machine translation system in 2006 and another is the upgraded customization method applied to scientific paper machine translation system in 2007. By introducing an upgraded customization method, we could implement a practical machine translation system for scientific paper translation within 8 months, in comparison with the patent machine translation system that was completed even in 24 months by the pure customization method. The translation accuracy of scientific paper machine translation system also rose 77.25% to 81.10% in spite of short term of 8 months.
منابع مشابه
A Hybrid Machine Translation System Based on a Monotone Decoder
In this paper, a hybrid Machine Translation (MT) system is proposed by combining the result of a rule-based machine translation (RBMT) system with a statistical approach. The RBMT uses a set of linguistic rules for translation, which leads to better translation results in terms of word ordering and syntactic structure. On the other hand, SMT works better in lexical choice. Therefore, in our sys...
متن کاملبرچسبزنی خودکار نقشهای معنایی در جملات فارسی به کمک درختهای وابستگی
Automatic identification of words with semantic roles (such as Agent, Patient, Source, etc.) in sentences and attaching correct semantic roles to them, may lead to improvement in many natural language processing tasks including information extraction, question answering, text summarization and machine translation. Semantic role labeling systems usually take advantage of syntactic parsing and th...
متن کاملAn Optimal Approach to Local and Global Text Coherence Evaluation Combining Entity-based, Graph-based and Entropy-based Approaches
Text coherence evaluation becomes a vital and lovely task in Natural Language Processing subfields, such as text summarization, question answering, text generation and machine translation. Existing methods like entity-based and graph-based models are engaging with nouns and noun phrases change role in sequential sentences within short part of a text. They even have limitations in global coheren...
متن کاملExperience of Health Leadership in Partnering With University-Based Researchers in Canada – A Call to “Re-imagine” Research
Background Emerging evidence that meaningful relationships with knowledge users are a key predictor of research use has led to promotion of partnership approaches to health research. However, little is known about health system experiences of collaborations with university-based researchers, particularly with research partnerships in the area of health system design and health service org...
متن کاملA new model for persian multi-part words edition based on statistical machine translation
Multi-part words in English language are hyphenated and hyphen is used to separate different parts. Persian language consists of multi-part words as well. Based on Persian morphology, half-space character is needed to separate parts of multi-part words where in many cases people incorrectly use space character instead of half-space character. This common incorrectly use of space leads to some s...
متن کامل